Fraud Detection with the SQL Server Suite Author : Dejan Sarka Reviewer :
نویسندگان
چکیده
........................................................................................................................................................ 5 Introduction ................................................................................................................................................... 5 The SolidQ Approach to Projects................................................................................................................... 7 Data Preparation ........................................................................................................................................... 9 Data Overview ............................................................................................................................................. 12 Data Mining Models .................................................................................................................................... 13 The Continuous Learning Cycle ................................................................................................................... 14 The Results .................................................................................................................................................. 14 Conclusion ................................................................................................................................................... 15 References ................................................................................................................................................... 16 About the Author ........................................................................................................................................ 17 About SolidQ ............................................................................................................................................... 17 Fraud Detection with the SQL Server Suite 3 Management Summary The most significant element of SolidQ’s approach to Fraud Detection is the continuous learning cycle. We are focusing on using Microsoft SQL Server Suite as the fraud detection toolset because it includes all of the software one needs to create an appropriate fraud detection infrastructure. The service is performed as mentoring: actual work and consulting, together with the transfer of knowledge onto the customer’s employees. Once the project is completed, or sometimes even as soon as the proof-of-concept (POC) project is completed, the customer can deploy the fraud detection system into production. The way the infrastructure is set up supports continuous learning, which means that the customer is able to improve the system after deployment and address the ever changing circumstances of the particular business, throughout its use. Two principal techniques are employed: supervised or directed models, and unsupervised or undirected models. Supervised data mining algorithms try to explain the value of the flags, with which the existing fraudulent transactions have been marked. When the patterns and rules that lead to fraudulent behavior are identified, they can be used to predict fraudulent behaviors in new transactions. Unsupervised techniques analyze data without any prior knowledge. With unsupervised models we establish a way to control the supervised ones, which together with OLAP models or DW reports represents the foundation for continuous learning infrastructure is established. SolidQ suggests to start with a proof-of-concept (POC) project. It takes between 5 and 10 working days. Besides SolidQ’s data mining mentor (expert) the team should include a subject matter expert (SME) and at least one information technology (IT) expert. We can always replace the IT part of the team with SolidQ people; however, we cannot conduct a project alone, without a SME. Steps: Training (optional, but strongly recommended) (1-2 days) Data Preparation (2-3 days together with the Data Overview) o Selection of data; Building computed variables; Sampling; Handling of missing values and outliers; Categorization. Data Overview (2-3 days together with Data Preparation) o Checking the distribution of variables; Finding dependencies between variables; the amount of information in variables (entropy). Building and Evaluating Data Mining Models (2 days) o Decision Trees; Neural Networks; Naïve Bayes; Clustering Initial preparation of the continuous learning infrastructure (1 day) Presentation of results (1 day) POC Benefits: The customer learns how fraudulent behavior manifests itself in operational data The customer’s employees learn how to perform the entire maintenance cycle on their own, which means that additional engagements by SolidQ would only be required in the unlikely event that the complexity of the problem grew unexpectedly o Analysts with appropriate subject matter expertise can perform additional in-depth analyses o IT experts can perform data extraction and preparation much more efficiently o Both groups of employees learn how to employ creativity to further improve the process and the procedures Fraud Detection with the SQL Server Suite 4 Improved data quality Improved employee job satisfaction, as each one of them can see how they could proactively contribute to the central knowledge about fraud patterns in the company With a larger number of employees involved in the enterprise the more and better fraud detection patterns can be developed Fraud Detection with the SQL Server Suite
منابع مشابه
Inside Microsoft SQL Server 2008 - T-SQL Querying
That's it, a book to wait for in this month. Even you have wanted for long time for releasing this book inside microsoft sql server 2008 t sql querying developer; you may not be able to get in some stress. Should you go around and seek fro the book until you really get it? Are you sure? Are you that free? This condition will force you to always end up to get a book. But now, we are coming to gi...
متن کاملFinancial Reporting Fraud Detection: An Analysis of Data Mining Algorithms
In the last decade, high profile financial frauds committed by large companies in both developed and developing countries were discovered and reported. This study compares the performance of five popular statistical and machine learning models in detecting financial statement fraud. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements betw...
متن کاملPresenting a Model for Financial Reporting Fraud Detection using Genetic Algorithm
both academic and auditing firms have been searching for ways to detect corporate fraud. The main objective of this study was to present a model to detect financial reporting fraud by companies listed on Tehran Stock Exchange (TSE) using genetic algorithm. For this purpose, consistent with theoretical foundations, 21 variables were selected to predict fraud in financial reporting that finally, ...
متن کاملMEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection
Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...
متن کاملAnomaly Detection and SQL Prepare Data Sets for Data Mining
Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. However, most anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems witho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013